A validation method for fuzzy clustering of gene expression data

نویسندگان

  • Thanh Le
  • Katheleen J. Gardiner
چکیده

Clustering is a key process in data mining for revealing structure and patterns in data. Fuzzy C-means (FCM) is a popular algorithm using a partitioning approach for clustering. One advantage of FCM is that it converges rapidly. In addition, using fuzzy sets to represent the degrees of cluster membership of each data point provides more information regarding relationships within the data than do alternative approaches that use crisp clustering. However, a limitation of FCM is that it requires initial specification of the number of clusters and subsequent validation of this number. Here, we propose a Bayesian method for fuzzy clustering validation using the fuzzy partition. We show that this method outperforms popular fuzzy cluster indices on both artificial and real biological datasets. Availability: The supplementary documents and the method software are at http://ouray.ucdenver.edu/~tnle/fzble.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

Evolutionary fuzzy cluster analysis with Bayesian validation of gene expression profiles

Clustering analysis of the gene expression profiles has been used for identifying the functions of unknown genes. Fuzzy clustering method, which is one category of clustering, assigns one sample to multiple clusters as their degrees of membership. It is more appropriate for analyzing gene expression profiles because genes usually belong to multiple functional families. However, general clusteri...

متن کامل

Bayesian Validation of Fuzzy Clustering for Analysis of Yeast Cell Cycle Data

Clustering for the analysis of the gene expression profiles has been used for identifying the functions of the genes and of unknown genes. Since the genes usually belong to multiple functional families, fuzzy clustering methods are more appropriate than the conventional hard clustering methods. However, it is still required to devise natural way to measure the quality of the cluster partitions ...

متن کامل

A Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data

The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...

متن کامل

خوشه‌بندی داده‌های بیان‌ژنی توسط عدم تشابه جنگل تصادفی

Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011